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Abstract — This paper investigates the minimum mean square 
error (MMSE) estimation of x, given the observation y = Hx+n, 
when x and n are independent and Gaussian Mixture (GM) 
distributed. The introduction of GM distributions, represents a 
generalization of the more familiar and simpler Gaussian signal 
and Gaussian noise instance. We present the necessary theoretical 
foundation and derive the MMSE estimator for x in a closed 
form. Furthermore, we provide upper and lower bounds for its 
mean square error (MSE). These bounds are validated through 
Monte Carlo simulations. 
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I. Introduction 



In estimation theory, an important model is the Bayesian 
linear model 

y = Hx+n, (1) 

where y is a vector of observations, H is a known matrix, 
x is the vector to be estimated and n is additive noise. If x 
and n are mutually independent Gaussian variates, then the 
minimum mean square error (MMSE) estimator for x is well 
known and quite tractable, see e.g. (TJ. 

There are, however, often good reasons to go beyond the 
Gaussian setting. For one, x and n may not be Gaussian. 
For another, the distributions of x and n may even be multi 
modal. For these reasons, besides some appreciation of greater 
generality, the pure Gaussian perspective is relaxed in this 
paper. 

The extension, considered below, maintains independence 
between x and n, but now either vector variate originates from 
a finite Gaussian mixture (GM) distribution. Specifically, 

x ~ ]T PfcA/XuW , Cg ) and n ~ ]T qiM(u^ , C« ), (2) 



keK, 



lec 



where the notation should be read in the distributional sense: x 
originates, with a prior probability pk, from a Gaussian source 
with distribution law Af(u^ , Cxx )■ Naturally, we require 
TlkP k = 1 anc l P k — 0- The n °i se > n > emerges in a similar 
but independent manner. JC and C are finite index sets. Their 
cardinalities determine the number of Gaussian components in 
the mixtures. Clearly, when JC and C are singletons, we fall 
back on the familiar case of Gaussian signal and Gaussian 
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noise. The component probabilities, component means and 
component co variances (pk,VL^ and Cx X 'j are collectively 
referred to as the parameters of a Gaussian mixture. 

Several properties speak in favor of GM distributions. An 
important one is that a GM distribution can, in theory, approx- 
imate any distribution with arbitrary accuracy. Said differently, 
the closure of GM distributions on the vector space X is the 
set all probability distributions on X. Thus, for any random 
vector x £ X there exists a sequence of random variables x n , 
all of which are GM distributed, such that 

Iim„_K>o E {g(x n )} = E{g(x)} for any bounded, 
continuous function g : X — > M. 

Therefore, by judiciously choosing the number of components, 
|/C|, and the corresponding parameters, the underlying input x 
is approximated "in distribution" as closely as desired by a 
Gaussian mixture. For a formal argument see e.g. J2). The 
intuition behind this asymptotic behavior is straightforward. 
First, x can be approximated ad libitum by a mixture (a convex 
combination) of Dirac measures. Second, each Dirac point 
measure is approximated by a normal distribution having that 
point as its mean - and a small covarianc^H 

A second reason for using GM distributions on x and n 
in (Q]l, is that this produces a posterior distribution on x|y 
which is also a GM. An analytic posterior distribution is very 
attractive: it quantifies our degree of belief in x for any y, and 
any optimal Bayesian estimator (with respect to any criterion) 
may be derived from it. 

Last, but not least, it is easy to calculate the mean and 
covariance of mixture distributions. These crucial parameters 
are transferred from underlying components in convenient 
ways. So, to the extent that first- and second-order analysis is 
important (the MMSE estimator corresponds to the posterior 
mean), mixtures have a lot to offer. 

Admittedly, to pass from from a pure Gaussian model to 
a corresponding GM model is not without challenges and 
drawbacks. A notable one, as we shall see, is that mean 
square error of the MMSE estimator cannot be determined 
analytically. 

There exists some related work on this topic. In 0, (4) and 
(5), it is shown that if two vectors x and y are jointly GM 
distributed, then the conditional distribution for x|y is also a 
GM. These works do, however, not explicitly assume that y 

1 Approximating an arbitrary distribution by a GM distribution, is generally 
a non-trivial problem. This paper is, however, not about density approxima- 
tion/learning. Here we assume that x and n are associated with known GM 
distributions. Whether these distributions are exact or approximations is not 
the focus here. 
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and x are related through a linear model, like (0. In [6], Q, 
1 8 1, [9], linear models are assumed. In all of these works, x is a 
GM, whereas n is purely Gaussian. For that simpler instance, 
the analytic MMSE estimator for x is provided. In |10|, 
recursive estimation of a GM distributed state sequence from 
GM distributed measurements is considered. The resulting 
optimal estimator is termed a non Gaussian Kalman Filter. 

The above mentioned related works have three aspects in 
common - all of which invite for further investigations: (i) 
they all assume that the observation noise is purely Gaussian 
(which we believe is only a special case of GM noise), (ii) 
the theoretical foundation upon which the presented estimators 
rest is not explicitly presented, and most importantly (iii), 
proper analysis of the resulting mean square error (MSE) 
is completely absent. For these reasons, a unified exposition 
including the derivation of the MMSE estimator for GM input 
and GM noise, its theoretical foundation, and analysis of 
its MSE, deserves to be made explicit. To the best of our 
knowledge, none exists in the literature. 

In the next section, we present a theorem which compactly 
presents the main result of the paper: the analytical MMSE 
estimator with upper and lower performance bounds. In section 
Hill we derive the posterior distribution rigorously, relying on 
the theory provided by the appendix. From the posterior, the 
MMSE estimator follows naturally. This proves the first part 
of the theorem. Section [TV] analyzes the MSE of the MMSE 
estimator when the posterior is a GM, and shows that the MSE 
cannot be determined in a closed analytic form. Instead, we 
derive upper and lower bounds for the MSE, which proves 
the second part of the theorem. In section [VJ these bounds are 
validated through Monte Carlo simulations, followed by the 
conclusion in section [VT] 



II. The MMSE estimator with performance bounds 

Theorem 1: If the data are described by the Bayesian linear 
model (fl]i where H is a known matrix, and x and n are 
independent and GM distributed as in (0, then the MMSE 
estimator of x is 



x = 5> (k,,) (y) 

k,l 



ul fe ) + C«H T 



HW + C«] (y-Hu 



,( fc ) „(') 



(3) 



The performance of the MMSE estimator, measured by its 
MSE, e 2 = i?{||x — xlH}, is lower and upper bounded by 

E^ Tr (c£ - c^n T (hc«h t + c«)~ hc« 



k,l 

<e 2 



(4) 



< Tr (c xx — C XX H 7 (HC XX H T + C nn ) HC X 
In (0, Tr(-) denotes the trace operator, and 

C xx = 5> (eg) + u«u« T ) - u x ul (5) 

k 

u x = J> fcU W, (6) 

k 

C„„ = $> (C« + u«u« T ) - u n ul, (7) 

l 

u n = ^rf. (8) 

l 

The proof of (0 is given in section [Till whereas the proof of 
(0 is given in section [TV] 

III. Deriving the analytical MMSE Estimator 

Our assumption is that x and n are independent and GM 
distributed as in 0. Then, by Proposition[4]from the appendix, 
x and n are jointly GM distributed as 



x 
n 



k,l 
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Observe that equation (0 can be written as 



y 




H 


I 




X 


X 




I 







n 



Therefore, the joint vector [y T x T ] T is a linear transform of 
the GM distributed vector [x T n T ] T . By Proposition [5] of the 
appendix, the joint vector [y T x T ] T is GM distributed as well: 



y 

x 



HC XX tl 




where 



We write the corresponding probability density function com- 
pactly as 



a^(y) 



E, s 2V7,/ (r ' s) (y)' 



and / (M) (y) is a Gaussian probability density function (PDF) 
in y with mean 



and covariance 



u (M) =Hu (fc) +u W 



C CM) =HC«H T + CW. 
yy xx 1 mi 



/(y,*) 



p fc g z /( fe ^(y,x), 



where /( fcii ) (y, x) is a Gaussian density with mean 
,(fc) j_„(0 

X 



Hui + u, 

,(*) 





" (fc,0 " 
u y 




(*) 

Ux 



and covariance 



nL xx ti 



Lix rl 



City Txr" 1 ^) 
nn nv ^xx 



„(k,i) „(fc) 
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Using Proposition [6] of the appendix, the marginal density for 

y is 



/(y) = X>«/ (fc,0 (y), 

k,l 



(9) 



where /( fe '^(y) is a Gaussian density with mean u y fc ' i ' ) and 



covariance Cyy . That is 



/W)(y)=^(y ; f i ) ) CM), 
The posterior density follows from Bayes' law as 

/(y,x) E klI p*«/ ( * ,0 (y,x) 



(10) 



/(x|y) 



/(y) E,, s M S / (r ' s) (y) 
E fc ,^/ (M) (y)/ (M) (x|y) 

E^p^/^Ky) 
E« CM) (y)/ (M) (x|y), 



where 



(ii) 



(12) 



The weight, cr fe, ^(y), can be seen as the joint probability 
of x originating from component k, and n originating from 
component I, given the observation y. Note that these weights 
are non-linear in the observation y, and satisfy a^ k ' 1 ' (y) > 
and Efc i a (fc, 'Hy) = !■ l n CCD, / (fc ' Z) ( x ly) is a conditional 
density of a multivariate Gaussian, /( fe, ')(y, x). In that case, 
/( fe '"(x|y) is known to be Gaussian (see e.g. Theorem 10.2 
of HI) with mean 



U (M) = u (k) +c (k) c -(k,l) ( u (k,i) 

x|y x ' ^xy "^yy yj u y 



(13) 



ul fc ) + C«H T (HCSh t + C» 

, y - Hui fc ) - uW) , (14) 



(15) 



and covariance 

r;(fc.O _ p(fc) _ c (k) r -(k,r) C (k) 

x|y ^xx ^xy^yy ^yx 

= Cg - C«H^ (HCStf + CW ) _1 HCS, 

(16) 

respectively. Here, and later, Cy y fe '^ is short for ^Cyy''^ 
The posterior density /(x|y) of (fTTl i is clearly GM distributed. 
By Proposition [TJ of the appendix, its mean is 



u x|y = i?{x|y} = 5>( fc <%)u^ 



(17) 



k.I 



and, by Proposition [2] of the appendix, the covariance is 



J x|y 



= £{(x-25{x})(x-.E{x}) T |y} 

= E k z« ( * ,0 (y) fc^ + uW^ 5 

^fe,i W / I x|y x|y x|y 

T 

— U x |yU x |y . 



(18) 



For the special case when \K\ = \C\ = 1 (Gaussian input 
and Gaussian noise), the posterior density, /(x|y), is purely 
Gaussian. Then the mean ( fTTl i reduces to 



u x| y = E{x|y}=u^ 
and the covariance ( fT8l reduces to 



<|y 



^x|y 



(19) 



(20) 



A. The MMSE estimator 

The MMSE estimator corresponds to the posterior mean, 
given in (fTTl i. that is 

XMMSE = u x|y 

Inserting ( Tl4T > into ( fTTl i proves (O in TheoremfTJ In the special 
case when \K,\ = \£\ = 1 and /(x|y) is Gaussian, we note 
from ( fT9l l and (fl~4-b that this estimator is linear in y, and from 
(l20l l and (fTol i that posterior covariance matrix does not depend 
on y. The latter property makes it easy to characterize the MSE 
of the estimator when /(x|y) is Gaussian. 

In the general case, when /(x|y) is a multi-component GM, 
the MMSE estimator ( fTTl i is non-linear in the observed data y, 
because of the data dependent weights aS k ' l \y). Furthermore, 
because the posterior covariance C x | y of ( fT8l depends on the 
observation y, the MSE becomes considerably more difficult 
to analyze, as we find in section fTVl 



B. The maximum a posteriori (MAP) estimator 

Although this paper is not about MAP estimation, we 
mention very briefly that the map estimator can be found 
(which is perhaps not entirely evident when the distribution 
is multi modal). The MAP estimate for x is 

Xmap = argmax/(x|y). 

X 

Thus xmap corresponds to the mode of /(x|y). In the 
special case when \JC\ = \C\ = 1 and /(x|y) is Gaussian, 
the MAP and MMSE estimates for x coincide, because the 
mode coincides with the mean. In the general case however, 
when /(x|y) is given by ( fTTi i. the posterior is a multi modal 
GM distribution. The mode of such a distribution cannot be 
expected to coincide with its mean. A procedure for finding 
the MAP estimate, is to find all the modes, and identify the 
one with the largest probability mass. Finding the modes of a 
GM distribution, is a problem which has been well described 
and solved in 0111 . Therefore we do not discuss it further here. 



IV. Error Analysis of the MMSE Estimator 
A. Mean Square Error 

For a given observation y, the MSE of the estimator in (fTTl i 
can be determined by the trace of C x | y of ( fT8l . Our main 
interest is not in the MSE for a particular y, but rather the 
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MSE averaged over all y. Said differently, we are interested 
in the MSE matrix 
M = /C x | y /(y)dy 

= / fe^(y) (cS) + u;\w) T )-u x|y u x|y ^ 



Using (0 and (fT2] i. we obtain 



M = 5>*« / (c 

fc,Z ^ ^ 



(k,l) . Ik, l) <k,l) J 

xy x ly x ly 



— U x | y U x |y 

/(*.0(y)4y. 



(21) 

We inspect the above integral term-by-term. The first term of 
(ED is 

Mi = J>« / C^/ (fe ' ;) (y)dy = E^«C^, (22) 



k.i 

(k,i) 



where the last equality holds because C\' y is not a function 
of y, as can be seen in ([TBI . The second term of (f2TT > is 

M 2 = J u^u^ T /( fe .')(y)dy. (23) 



Inserting ([T3l > into ( |23l , we obtain 
M 2 



E Pfe * 

k,l 



u (fc) + C (fc) C -(*,0 fy- U ( ? 



[u« + CWC^W) (y - /^)(y)dy 

2>« (i4«4* )r + c«c3^*.oc« 



(24) 



where the last equality is obtained using ( TT~5T >. The third term 
of (ED is 

M 3 = - J u x | y u x , y T Ep fcgi /( fcJ )(y)dy. (25) 



Note from ([T7) that 

T 

u x|y u x|y 



- fc« (M) (y)< y ^ (E« ( ^ s) (y)u5 y s)T ) 
E^/^y)^) f Efr«./ (r>,) (y)«5? 5 



E^/^Cy) 

Hence the integral in d25t can be written 

_ f f^(y)f^(y)u^^f , 
M 3 = - E P*»Pr5« / ^ f( „Z, rf y- 



k.l.r,; 



As far as we can see, this integral cannot be solved analytically, 
meaning that we cannot determine the MSE matrix exactly. 
Our main interest is in the trace of M, because this corre- 
sponds to the MSE: 

e 2 = Tr(M) = Tr(Mi) + Tr(M 2 ) + Tr(M 3 ). 

In the absence of an analytical expression of e 2 , we pursue 
upper and lower bounds, as follows. From equations i 
and (E3, we note that 



Tr(Mi) 
Tr(M 2 ) 
Tr(M 3 ) 



k,l 



^x|y 



k,l 



u 



i i 

x|y 



k,l 



Pkqi 



U X |y T U 



, ly f ik ' l) (y)dy, 



(26) 
(27) 



respectively. Since p k > 0, qi > 0, f {k ' l) (y) is a PDF, C^ 3 

is a covariance matrix, and u , u , and u x i v u x v are 

x iy x iy ' J ' J 

inner products, it can be concluded that 

Tr(Mi) > 0, Tr(M 2 ) > 0, Tr(M 3 ) < 0. (28) 

Furthermore, from (ESJl and (ED, we note that 

Tr(M 2 ) + Tr(M 3 ) 



k.i 



dy 



E 

fej 



Vt/ / \ x|y x|y 



u x | y ^ u x | y ] /(y)efy 



/E« W) (y)(ui^-u 

" 7. 7 



U 



(fc,0 



u 



* x ly 



:|y 



/(y)dy 



n y j f(y)dy 

(29) 



> 0, 



where the second equality is obtained by using (0 and 
(flZb : the third equality is obtained by using ( TT7l > and 



E 



A,/ 



O: 



(fc,0 



(y) = 1; and the inequality is obtained by using 



a (k,l) ^ > q This, combined with the conditions 
the following bounds 

Tr(Mi) <e 2 < Tr(Mi) + Tr(M 2 ). 



gives 



(30) 



By appropriate substitutions using (1221 and ( [ToT l. one obtains 
the lower bound in (0]i of Theorem Q] 

An alternative argument provides an intuition for the bounds 
in (l3Qb - Imagine that a side information is available in the 
estimation process such that, for each observation y, a genie 
tells us which single Gaussian component in (El has generated 
the underlying x, and also which single Gaussian component 
has generated the underlying n. Said differently, for each y, 
we face the familiar model of Gaussian signal and Gaussian 
noise. Such a genie-aided estimator can be described as a two- 
stage estimator consisting of (1) a perfect (error free) decision 
device, followed by (2) a decision dependent Gaussian signal 
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and Gaussian noise MMSE estimator. In this (imaginary but 
very favorable) case, we note that 

M = M u and hence e 2 = Tr(Mi). 

Without a genie, we must expect an error of at least Tr(Mi). 
This implies that Tr(M 2 ) + Tr(M 3 ) > 0. Since Tr(M 3 ) < 0, 
we reach the same conclusions as in (f30b . In the next section, 
we show that there exists a tighter upper bound than the one 
in dH. 

B. Tightening the Upper Bound 

The upper bound of ((30), Tr(Mi) + Tr(M 2 ), can in fact 
be replaced by a tighter one. This can be seen by invoking 
the following argument. Instead of using the optimal MMSE 
estimator in dTTb . we could use a linear MMSE (LMMSE) es- 
timator. The LMMSE estimator is given by (see e.g. Theorem 
12.1 of Q) 

x = u x + C XX H T (HC XX H T + C nn ) 1 (y Hu x u n ) , 

(31) 

with corresponding MSE matrix 

C xx — C XX H T (HC XX H T + C nn ) HC XX . (32) 

Here, u x and C xx are the mean and covariance of x, given 
by (|6) and (0 respectively, and u n and C nn are the mean and 
covariance of n, given by ([8) and (0 respectively. 

The MSE of the LMMSE estimator is given by the trace of 
432): 

ef = Tr ^C xx — C XX H T (HC XX H T + C nn ) HC xx j 
- Tr (Cxx) - Si" ( HC xxH t + C nn ) _1 gj, (33) 

3 

where gj is the j-th column of HC XX . In d33) . 
(HC XX H T + C nn ) is a positive semidefinite matrix, which 
implies that 

e\ <Tr(C xx ). 

Now, we compare this with Tr(Mi) +Tr(M2). Using (122) 
and ( 124) . we may write 

Tr(Mi) + Tr(M 2 ) 
= Tr(C xx ) 

where the last equality follows from using Proposition|2]of the 
appendix. Since we know that the LMMSE estimator cannot 
outperform the optimal MMSE estimator, on average, we can 
replace Tr(Mi) +Tr(M 2 ), by the tighter bound t\. Note that 
e| in (133) corresponds to the upper bound in (3) of TheoremQ] 
In summary, the performance of the optimal MMSE estima- 
tor in (fTT) is lower bounded by a genie-aided MMSE estimator 
and upper bounded by the LMMSE estimator. 



C. Simple Examples: High and Low SNR Cases 

Intuitively, one expects that the MSE approaches its lower 
bound as the signal-to-noise-ratio, 

e{\\AI\ 

SNR = — ) L, 

e{H\1\ 

goes to infinity and the upper bound as the SNR goes to 
zero. We will demonstrate that this is true for a simple, but 
instructive, example. Throughout this example, we assume the 
noise to be distributed as 

n~ a £ffijV(uW,cW) =^g^(auW,a 2 CW), (34) 
tec lec 

where a is a scalar which can be set to account for any SNR 
level. Furthermore, we assume that H is a full rank square 
matrix. Then (TT4l can be written 

= u« + C«H T (HCStf + a 2 C« ) " 

(y-Hu«-ou«). 

1) High SNR: We drive the SNR towards infinity by 
lim a — > 0. Then the above reads 

= + Cgtf (HCWtf 1 ) " (y - Hu?) 
= U W+H- 1 (y-HuW) 

= H-y (35) 

Thus, the component means of the the posterior are all the 
same. In that case we have u x | y = u x iy > an d from ( |29) it 
can be verified that Tr(M 2 ) + Tr(M 3 ) = 0. Hence, the MSE 
will be determined by Tr(Mi) only, and by d30) it therefore 
reaches the lower bound. This bound can be found, using ( fT6) . 
which in our case reduces to: 

= eg - cStf (hcS^) - 1 HC« 

= Cg - H ^HCg = 0. 

Inserting this into ( 122b . and taking the trace, we find that the 
lower bound of the MSE is zero. Note from d35) , that the 
estimator discards all prior knowledge and completely trusts 
the data. This is expected at infinitely high SNR. 

Finally, we remark that the MSE of the LMMSE estimator 
also will also be zero when the SNR goes to infinity: With n 
distributed as in (f34), the LMMSE estimator in (f3~Tb becomes 

x = u x + C XX H T (HC XX H T + a 2 C nn ) _1 

(y - Hu x - tm n ) . (36) 

Taking lim a — >• 0, this simplifies to 

x = H-y. 

But this is the same as ( [35] ). Hence, at very high SNR the 
LMMSE estimator and the optimal MMSE estimator coincide, 
and therefore have the same performance. 
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2) Low SNR: Here, it is convenient to rewrite ( TBI in an 
alternative, but equivalent form 

"3? = ^ + ( C xi fc) + H T C n ^) H ) " H r C n (') 



y - Hu<') - uW 
With n distributed as in (l34l l. this becomes 

C x + lH r C n ('>H) ' 4H r C"(0 

a z 

( 



u ( *'°= u« 
x|y x 



When driving the SNR very low, by lima — > oo, this reduces 
to 



<M) fe) 
x y x 



Thus, the MMSE estimate for x is 

ux ly =E« (M) (y)^ 

fe,i 

_ E M ^/ (fe '°(y) 



(fe) 



,(fe) 



E 

k 



Pfeu; 



(37) 



In (|37l >, the last equality holds because /' fe, ^(y) has covari- 
ance 

cM=HCgH r + ft 2 C« 

and when a — > oo, f^ k ' l ^(y) approaches a uniform distribution 
with infinite support. Hence, it approaches a constant which 
is independent of y, k and I, and we may simply disregard 
it. Note from (f37l >. that the estimator discards the data and 
uses only prior information, which is expected at zero SNR. 
Now, we turn to the LMMSE estimator d361 l. which may be 
rewritten equivalently as 



x = u x 



(y - Hu x - au n ) 



With lima — > oo, this reduces to 



fc 



(38) 



But <[38j is equal to <|37}. Thus, also at very low SNR, the 
MMSE estimator and the LMMSE estimator coincide. In that 
case, the error of the MMSE estimator coincides with e 2 L in 
(l33l l. which corresponds to the upper bound. 

In summary, in the asymptotic cases of infinite and zero 
SNR, the MMSE estimator attains minimum and maximum 
error respectively. In these extreme cases, one might just as 
well use the simpler LMMSE estimator, because it performs 
identically. 
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Fig. 1. Estimate of the Bayesian MSE e , together with its upper and lower 
bounds. H = I and uniform p^. 



V. Simulation Results 

We have shown that at infinite and zero SNR, the LMMSE 
estimator is just as good as the MMSE estimator. Now we 
demonstrate that at more realistic and intermediate SNRs, the 
MMSE estimator certainly outperforms the LMMSE estimator. 
We do this using Monte Carlo simulations. An estimate 
of e 2 can be obtained by calculating the sample mean of 
|| x — u x |y|L from many independent observations. The plot 
in Figure Q] shows the lower bound, Tr(Mi), the upper bound 
e| and an estimate of e 2 , all in dB, versus an increasing SNR. 
The SNR ranges from -10 dB to 50 dB in steps of 1 dB. The 
following parameters have been used: 

• H = I, with I being 5x5. 

• x is GM distributed with \JC\ =4. The component means 
are the columns of the following matrix 



35.381 
-20.184 
-6.377 
24.419 
38.891 



-47.087 
0.286 

-68.308 
4.400 
1.195 



79.522 
-51.577 
-17.330 

-7.422 
9.282126 



-30.903 
-5.826 
3.246 
-101.586 
-0.047508 



These columns have simply been drawn independently 

from A/70, vlOOOI). We use component co variance ma- 

(fe) 

trices C xx ; = I, and uniform component probabilities 
Pk = 1/\JC\ = 1/4. 
> Gaussian noise: n ~ A/"(0,/3I). Proper adjustment of (3 

provides the required SNRs. 
The estimated MSE (e 2 ) is obtained by averaging over 
50000 independent y's for each SNR value. One observes 
that Figure ([TJ is in line with our findings in section IIV-CI 
At low SNR, the MSE approaches its upper bound, and at 
high SNR it approaches the lower, both of which coincide 
with the MSE of the LMMSE estimator. Note however, that 
at intermediate SNR values, the optimal MMSE estimator 
outperforms the LMMSE estimator (the upper bound) quite 
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substantially - and most impressively, for finite and quite 
modest SNRs (approximately 10 dB and larger), the MMSE 
estimator performs as if it was helped by a genie. 

Without showing further plots, we remark that in the case 
when the component means of x have less variance (are less 
scattered) than in our example, then x is in principle more 
'Gaussian', and the MSE will be closer to the upper bound 
for all SNR values. Similarly, when the component means have 
larger variance (are more scattered) than in our example, then 
x becomes more distinctly GM distributed, and the MSE starts 
to drop from the upper bound at even lower SNR values. 

For the interested reader, the MATLAB code which pro- 
duced the plot in Figure [T] can be downloaded from: 
http://sites.google.com/site/saikatchatt/softwares 

VI. Conclusion 

We have provided the necessary theoretical foundation and 
derived the MMSE estimator from the Bayesian linear model, 
when both the noise and the signal have GM distributions. 
Furthermore, we have shown that the MSE of this estimator 
cannot be determined in closed form, but that it can be 
upper bounded by an LMMSE estimator, and lower bounded 
by a genie aided MMSE estimator. Monte Carlo simulations 
confirm the bounds, and show that the difference in perfor- 
mance between the optimal MMSE estimator and the LMMSE 
estimator may be substantial. 
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VIII. Appendix: Transforms of GM distributed 

RANDOM VECTORS 

In the literature, mixture distributions are often character- 
ized by a convex combination of probability density functions, 
see e.g lfl2l . ifTSl . Since not all random variables can be char- 
acterized by a probability density function (not all probability 
measures have a density f\4\ ). the results presented in this 
appendix do not rely on probability densities. The results are 
obtained using distributions (alias measures) and characteristic 
functions, both of which always exist. 

Propositions Q] and [2] can be found in similar form in ifTTl . 
The other propositions, may well exist in the literature, but 
we have not been able to find them. Since much of our work 
depends on these propositions, it is natural to include them. 

Definition 1: Finite Mixture distribution. 
Let JC be a finite index set. For each k E JC, let pj. be 
the probability of drawing index k from JC, and let P k be 
a probability distribution (or measure) on a Euclidean (finite- 
dimensional vector) space X. Then, the convex combination 



P= Y^PkPk 

keic 



(39) 



also defines a probability distribution on X. We call (1391 as a 
finite mixture distribution on X. 

Definition 2: Gaussian Mixture (GM) distribution. 
When all component measures {Pk} are Gaussian, we call ( |39l > 
as a (finite) Gaussian mixture (GM) distribution. We indicate 
that a random variable x is GM distributed by writing 



x~£p fe Af(u«,Cg), 

k 

where it is implicit that k belongs to a finite index set. 

In the following, x denotes a vector in the sample space 
X. We define all vectors as column vectors, and assume all 
samples spaces to be continuous. 

Proposition 1: Mean of a mixture. 
Suppose Pk has finite mean 

= / xP fe (dx). 

JxGX 

Then the mixture distribution of (l39l has mean 
Proof: 

u x = / PkPk{dx) 

"' xeX keK 

= y^Pk xP fc (dx) 



keK 

K 



(k) 



k=l 



Proposition 2: Covariance of a mixture. 

(k) 

Suppose Pk has the finite mean u x , and all elements of the 
covariance matrix 

C«:= / (x-u x fe ))(x-u x fe )fP fc (dx) 

have finite magnitudes. Then, the covariance of the mixture 
distribution d39l is 



C xx = ^> (cS+uf«f T )-u A 



keic 



Proof: We use the fact that C xx = E(xx T )-E(x.)E(-x.) T 
always holds. Thus 

C xx = / xx T ^2p k P k (dx.) - u x u x 

JxeX keic 

= ^Pk xx T P fc (dx) - u x ii x T 

keic -'xex 

= E^( C S+u^u( fe ) T )-u x u x r 



keic 



Proposition 3: Characteristic function of a GM dis- 
tributed random vector. 

Let x ~ J2kPkN(\ix \ Cxx)- Then the characteristic func- 
tion of x is (see e.g. lfT31 ) 



(t) = £> k e 

k 



it T u«-At r c£?t 



x 



for any real vector t. 

Proof: For any real vector t, the characteristic function 
for x ~ W(u x , C xx ) is 

4>{t) = [ e ltTx P(dx) = e ^ r "x-it T c xx t 



where P = Af(u x , C xx ). Now, if x ~ J2 k p k Af{vtf> , C 
then the characteristic function is 



XX J 



(t) = E[e lt x ^ 

= / e ltTx ^p fc P fc (dx) 

k 



.t T ui'l-it T cl'»t 



Proposition 4: Joint distribution of independent GM 
distributed random vectors. 

Let x ~ EfcP^u^, Cg) and y ~ £ r ^(u^, C#), 
where and x and y are mutually independent. Then x and y 
are jointly GM distributed as 



x 

y 



fc.r 





(fe) 
u x 




( 


(r) 


■ 



p(fe) 







-^yy 



Proof: By Proposition [3] the characteristic functions of x 
and y are 



c (t)=J> fc e lt 



T u(">-It T C<l>t 



and 



q r e u y 2 s L yy s 



respectively. Because of the independence, the characteristic 
function of the joint random vector [x T y T ] T is 

t 

s 

</> X (t)0y(s) 

J2 Pkqre 1 ^^ +sT < ) ) - * (t^Wt+s-C W s) 

fe,r 

2jp fc g r exp i [t T s T ] 

k.r \ 



U 



(r) 







■'xx 





for any real vector [t T s T ] T . ■ 
Proposition 5: Affine transform of a GM distributed 
random vector. 

Let y = Dx + a where x ~ E& PkN{\ix\ C XX ). Then 
y ~ ^p^(Du' fc ) + a, DC2D T ). 



Proof: 
*y(t) = 



E ^it 1 (Dx+ a ; 

e *t T a J- pk j(p T t) r ^ - i (D r t) T C W (D r t) 

Eit T (Du' fc) +a) - it T DCW D T t 



Proposition 6: Marginal distribution of a GM distribu- 
tion. 

Let x ~ J^k Pfe-^( u x C ' ) , Cxx )• Partition x into two sub vectors 
such that 



xi 

X2 



C (fe) = 



. u 

,(fc 



(k) 



u 



u 



(fe) 

xi 
(fc) 



and 



WlXl ^xix 2 



C(k,l /-< 
x 2 xi 



(fc) 

x 2 x 2 



Then the marginal distribution for Xi is 

Proof: Without loss of generality, assume that x x contains 
the p first elements of x. Let 



D 



I P 





Then X! = Dx, and by Proposition [5] the statement is proved. 
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